Goto

Collaborating Authors

 gender and racial bias


Evaluation of Bias Towards Medical Professionals in Large Language Models

Chen, Xi, Xu, Yang, You, MingKe, Wang, Li, Liu, WeiZhi, Li, Jian

arXiv.org Artificial Intelligence

This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by changing gender and race information, while implicit bias was tested by changing names while hiding race and gender. Physician data from the Association of American Medical Colleges was used to compare with real-world demographics. 900,000 resumes were evaluated. All LLMs exhibited significant gender and racial biases across medical specialties. Gender preferences varied, favoring male candidates in surgery and orthopedics, while preferring females in dermatology, family medicine, obstetrics and gynecology, pediatrics, and psychiatry. Claude-3 and Mistral-Large generally favored Asian candidates, while GPT-4 preferred Black and Hispanic candidates in several specialties. Tests revealed strong preferences towards Hispanic females and Asian males in various specialties. Compared to real-world data, LLMs consistently chose higher proportions of female and underrepresented racial candidates than their actual representation in the medical workforce. GPT-4, Claude-3, and Mistral-Large showed significant gender and racial biases when evaluating medical professionals for residency selection. These findings highlight the potential for LLMs to perpetuate biases and compromise healthcare workforce diversity if used without proper bias mitigation strategies.


Bias in Generative AI

Zhou, Mi, Abhishek, Vibhanshu, Derdenger, Timothy, Kim, Jaymo, Srinivasan, Kannan

arXiv.org Artificial Intelligence

This study analyzed images generated by three popular generative artificial intelligence (AI) tools - Midjourney, Stable Diffusion, and DALLE 2 - representing various occupations to investigate potential bias in AI generators. Our analysis revealed two overarching areas of concern in these AI generators, including (1) systematic gender and racial biases, and (2) subtle biases in facial expressions and appearances. Firstly, we found that all three AI generators exhibited bias against women and African Americans. Moreover, we found that the evident gender and racial biases uncovered in our analysis were even more pronounced than the status quo when compared to labor force statistics or Google images, intensifying the harmful biases we are actively striving to rectify in our society. Secondly, our study uncovered more nuanced prejudices in the portrayal of emotions and appearances. For example, women were depicted as younger with more smiles and happiness, while men were depicted as older with more neutral expressions and anger, posing a risk that generative AI models may unintentionally depict women as more submissive and less competent than men. Such nuanced biases, by their less overt nature, might be more problematic as they can permeate perceptions unconsciously and may be more difficult to rectify. Although the extent of bias varied depending on the model, the direction of bias remained consistent in both commercial and open-source AI generators. As these tools become commonplace, our study highlights the urgency to identify and mitigate various biases in generative AI, reinforcing the commitment to ensuring that AI technologies benefit all of humanity for a more inclusive future.


Worried about AI? How California lawmakers plan to tackle the technology's risks in 2024

Los Angeles Times

Jodi Long was caught off guard by the cage filled with cameras meant to capture images of her face and body. "I was a little freaked out because, before I walked in there, I said I don't remember this being in my contract," the actor said. The filmmakers needed her digital scan, Long was told, because they wanted to make sure her arms were positioned correctly in a scene where she holds a computer-generated character. That moment in 2020 stuck with Long, president of SAG-AFTRA's Los Angeles local, while she was negotiating for protections around the use of artificial intelligence when actors went on strike. In November, the actors guild reached a deal with Hollywood studios that -- among other things -- required consent and compensation for the use of a worker's digital replica.


Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Sun, Huaman, Pei, Jiaxin, Choi, Minje, Jurgens, David

arXiv.org Artificial Intelligence

Human perception of language depends on personal backgrounds like gender and ethnicity. While existing studies have shown that large language models (LLMs) hold values that are closer to certain societal groups, it is unclear whether their prediction behaviors on subjective NLP tasks also exhibit a similar bias. In this study, leveraging the POPQUORN dataset which contains annotations of diverse demographic backgrounds, we conduct a series of experiments on four popular LLMs to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. We further explore prompting with the target demographic labels and show that including the target demographic in the prompt actually worsens the model's performance. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups. Our results suggest that LLMs hold gender and racial biases for subjective NLP tasks and that demographic-infused prompts alone may be insufficient to mitigate such effects. Code and data are available at https://github.com/Jiaxin-Pei/LLM-Group-Bias.


ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management

Zhang, Angela, Yuksekgonul, Mert, Guild, Joshua, Zou, James, Wu, Joseph C.

arXiv.org Artificial Intelligence

Recent breakthroughs in large language models (LLMs) have led to their rapid dissemination and widespread use. One early application has been to medicine, where LLMs have been investigated to streamline clinical workflows and facilitate clinical analysis and decision-making. However, a leading barrier to the deployment of Artificial Intelligence (AI) and in particular LLMs has been concern for embedded gender and racial biases. Here, we evaluate whether a leading LLM, ChatGPT 3.5, exhibits gender and racial bias in clinical management of acute coronary syndrome (ACS). We find that specifying patients as female, African American, or Hispanic resulted in a decrease in guideline recommended medical management, diagnosis, and symptom management of ACS. Most notably, the largest disparities were seen in the recommendation of coronary angiography or stress testing for the diagnosis and further intervention of ACS and recommendation of high intensity statins. These disparities correlate with biases that have been observed clinically and have been implicated in the differential gender and racial morbidity and mortality outcomes of ACS and coronary artery disease. Furthermore, we find that the largest disparities are seen during unstable angina, where fewer explicit clinical guidelines exist. Finally, we find that through asking ChatGPT 3.5 to explain its reasoning prior to providing an answer, we are able to improve clinical accuracy and mitigate instances of gender and racial biases. This is among the first studies to demonstrate that the gender and racial biases that LLMs exhibit do in fact affect clinical management. Additionally, we demonstrate that existing strategies that improve LLM performance not only improve LLM performance in clinical management, but can also be used to mitigate gender and racial biases.


Common names in Burkina Faso, West-Africa

#artificialintelligence

Burkina Faso is a multi-cultural and diverse country with a rich history. In this article, we explore how personal names can be interpreted to reflect regional, ethnic appartenance within the country. Then we illustrate how the use of a personal name can affect a black-box Artificial Intelligence – such as OpenAI's DALL-E. This is a first article in our series of blog posts with tag #thisnamedpersondoesnotexist.


New Study Warns Of Gender And Racial Biases In Robots - AI Summary

#artificialintelligence

Read the complete article at: www.unite.ai Stay updated on last news about Artificial Intelligence. Check your inbox or spam folder to confirm your subscription.


Gender and Racial Bias in Visual Question Answering Datasets

#artificialintelligence

Vision-and-language tasks have increasingly drawn more attention as a means to evaluate human-like reasoning in machine learning models. A popular task in the field is visual question answering (VQA), which aims to answer questions about images. However, VQA models have been shown to exploit language bias by learning the statistical correlations between questions and answers without looking into the image content: e.g., questions about the color of a banana are answered with yellow, even if the banana in the image is green. If societal bias (e.g., sexism, racism, ableism, etc.) is present in the training data, this problem may be causing VQA models to learn harmful stereotypes. For this reason, we investigate gender and racial bias in five VQA datasets.


Understanding Gender and Racial Bias in AI, Part 2 :: UXmatters

#artificialintelligence

How do algorithmic bias, our design tools, and bad habits contribute to the whitewashing of design? The everyday tools we use to navigate our daily lives and our design work drive us toward creating design solutions that are similar to those we already know and like. As a tech community that is largely driven by white people, we are constantly served images of white people. These white faces and stories end up in our personas and user-experience maps and drive our design decision making. Such bias will persist unless we acknowledge this is happening and stop the whitewashing of our design deliverables and our design solutions.


Artificial Intelligence Has a Problem With Gender and Racial Bias

#artificialintelligence

I experienced this firsthand, when I was a graduate student at MIT in 2015 and discovered that some facial analysis software couldn't detect my dark-skinned face until I put on a white mask. These systems are often trained on images of predominantly light-skinned men. And so, I decided to share my experience of the coded gaze, the bias in artificial intelligence that can lead to discriminatory or exclusionary practices. Altering myself to fit the norm--in this case better represented by a white mask than my actual face--led me to realize the impact of the exclusion overhead, a term I coined to describe the cost of systems that don't take into account the diversity of humanity. How much does a person have to change themselves to function with technological systems that increasingly govern our lives? We often assume machines are neutral, but they aren't.